Insurance redlining -- a complete example continued

12/05/2019

Instructions:

Use the left and right arrow keys to navigate the presentation forward and backward respectively. You can also use the arrows at the bottom right of the screen to navigate with a mouse.

Introduction

  • Recall, we are investigating a complicated question:

    • were insurance companies were applying discriminatory business practices on majority non-white neighborhoods in Chicago,
    • or were their practices justifiable based on standard business practices, e.g.,
    • limiting access or raising prices based on justifiable crime statistics, etc?
  • The term “insurance redlining” refers litterally drawing a red line in a map, that excludes certain neighborhoods from services.

  • In the late 1970s, the US Commission on Civil Rights examined charges by several Chicago community organizations that insurance companies were redlining their neighborhoods.

  • Because comprehensive information about individuals being refused homeowners insurance was not available, the number of FAIR plan policies written and renewed in Chicago by zip code for the months of December 1977 through May 1978 was recorded.

  • The FAIR plan was offered by the city of Chicago as a default policy to homeowners who had been rejected by the voluntary market.

  • Information on other variables that might affect insurance writing such as fire and theft rates was also collected at the zip code level.

Chicago (chredlin) dataset

Map of Chicago neighborhoods Courtesy of Peter Fitzgerald CC BY-SA 3.0
  • The data that we have available for our analysis includes the variables:
    1. race – ethnic composition in percentage of minority;
    2. fire – fires per 100 housing units;
    3. theft – thefts per 1000 population;
    4. age – percentage of housing units built before 1939;
    5. involact – new FAIR plan policies and renewals per 100 housing units;
    6. income – median family income in thousands of dollars;
    7. side – north or south side of Chicago.
  • For reference, the South Side of Chicago has long had a reputation and cultural identity built around communities of working class African, Hispanic, Chinese, Irish, and Polish Americans, amongst others.
  • This is due, in part, to historic segregation measures which by the late 1970s were considered illegal.

The ecological fallacy

  • We note here that we do not actually know the ethnicity of the individuals who are denied insurance in this data set.

  • Rather, we only know the ethnic makeup of the zip code where an individual was denied.

  • This is an important difficulty that needs to be considered carefully before starting our analysis.

  • When data are collected at the group level, we may observe a correlation between two variables.

  • The ecological fallacy is concluding that the same correlation holds at the individual level.

  • Particularly, we only have aggregate data on the zip code level describing the ethnic composition of different neighborhoods, and the number of FAIR plan policies per neighborhood.

  • Even if there is a strong correlation between neighborhoods that have a high proportion of minority residents and a higher number of fair plans,

    • we cannot make the conclusion that minority individuals in these neighborhoods were disproportionately rejected on the normal insurance market.

Review of exploratory analysis

  • In our exploratory analysis, we noted that many of the variables are relatively skewed, with the entire first quartile of the response equal to zero.

  • We also saw some issues with multicollinearity in the variables, where the main predictor of interest race has some weak and some strong (anti)-correlations with other variables.

  • The main point of interest is to distinguish these effects, as much as possible, and to determine if there was any systematic bias against neighborhoods with high minority concentrations.

Exploratory analysis – continued

library(faraway)
library(ggplot2)
ggplot(chredlin,aes(race,fire)) + geom_point() +stat_smooth(method="lm") + theme(axis.text=element_text(size=20), axis.title=element_text(size=20,face="bold"))

plot of chunk unnamed-chunk-1

cor(chredlin$fire,chredlin$race)
[1] 0.5927956
  • When plotting fire versus race, we see a somewhat nonlinear but slightly positve trend between the neighborhoods of high minority concentration and higher rate of fires.

Exploratory analysis – continued

ggplot(chredlin,aes(race,theft)) + geom_point() +stat_smooth(method="lm") + theme(axis.text=element_text(size=20), axis.title=element_text(size=20,face="bold"))

plot of chunk unnamed-chunk-2

cor(chredlin$theft,chredlin$race)
[1] 0.2550647
  • When plotting theft versus race, we see a trend of slightly higher rate of theft with neighborhoods of higher minority concentration.

  • We note, also, this plot has an extreme outlier that may be important for the analysis – this might be skewing the trend positively in a way that isn't consistent with the overall behaviour of the city.

  • Genearlly, the correlation is much weaker than with fire, however.

Model selection – continued

  • Plotting median income versus the percent of minority in each neighborhood, we see that these variables are strongly anti-correlated:
ggplot(chredlin,aes(race,income)) + geom_point() +stat_smooth(method="lm") + theme(axis.text=element_text(size=20), axis.title=element_text(size=20,face="bold"))

plot of chunk unnamed-chunk-3

cor(chredlin$income,chredlin$race)
[1] -0.7037328
  • This is due, in part, to the historic segregation and inequality in Chicago, and the influence of these two variables can be very difficult to separate.

Summary of diagnostics

  • We will recall some of the diagnostics here.

  • We note that there is a structural issue arising due to the high number of zero FAIR plans (the response variable), but that there isn't a realistic choice of scale transformation to fix this.

  • Normality and variance assumptions of the error are otherwise OK.

  • We will probably need to evaluate the effect of points of high leverage in the analysis.

  • We wish thus to study the sensitivity of the main parameter of interest “race”, with respect to the leverage points and controlling for covariates in the analysis.

  • We will begin by looking at partial residual plots, which are used for nonlinearity detection when factoring out for covariates.

    • Specifically, we will evaluate the structural uncertainty of the model in terms of the hypothesis, \[ \mathbf{Y} = \mathbf{X} \boldsymbol{\beta} + \boldsymbol{\epsilon} \] is a valid form for the signal.
  • We will also look at the confidence intervals for the parameters.

Evaluating the uncertainty

par(mai=c(1.5,1.5,.5,.5), mgp=c(3,0,0))
lmod <- lm(involact ~ race + fire + theft + age + log(income), data=chredlin)
termplot(lmod, partial.resid=TRUE, terms=1,  cex=3, cex.lab=3, cex.axis=1.5)

plot of chunk unnamed-chunk-4

coefficients(lmod)[2]
       race 
0.009502223 
confint(lmod, parm = "race")
           2.5 %     97.5 %
race 0.004474458 0.01452999

Evaluating the uncertainty – continued

par(mai=c(1.5,1.5,.5,.5), mgp=c(3,0,0))
termplot(lmod, partial.resid=TRUE, terms=2,  cex=3, cex.lab=3, cex.axis=1.5)

plot of chunk unnamed-chunk-5

coefficients(lmod)[3]
      fire 
0.03985604 
confint(lmod, parm = "fire")
          2.5 %     97.5 %
fire 0.02215246 0.05755963

Evaluating the uncertainty – continued

par(mai=c(1.5,1.5,.5,.5), mgp=c(3,0,0))
termplot(lmod, partial.resid=TRUE, terms=3,  cex=3, cex.lab=3, cex.axis=1.5)

plot of chunk unnamed-chunk-6

coefficients(lmod)[4]
      theft 
-0.01029451 
confint(lmod, parm = "theft")
            2.5 %       97.5 %
theft -0.01598536 -0.004603655

Evaluating the uncertainty – continued

par(mai=c(1.5,1.5,.5,.5), mgp=c(3,0,0))
termplot(lmod, partial.resid=TRUE, terms=4,  cex=3, cex.lab=3, cex.axis=1.5)

plot of chunk unnamed-chunk-7

coefficients(lmod)[5]
      age 
0.0083356 
confint(lmod, parm = "age")
          2.5 %     97.5 %
age 0.002793968 0.01387723

Evaluating the uncertainty – continued

par(mai=c(1.5,1.5,.5,.5), mgp=c(3,0,0))
termplot(lmod, partial.resid=TRUE, terms=5,  cex=3, cex.lab=3, cex.axis=1.5)

plot of chunk unnamed-chunk-8

coefficients(lmod)[6]
log(income) 
  0.3457615 
confint(lmod, parm = "log(income)")
                 2.5 %   97.5 %
log(income) -0.4623041 1.153827

Evaluating the uncertainty – continued

  • For each of:
    1. race
    2. fire
    3. age
  • there isn’t a great deal of uncertainty in the effect on the response, and the structure appears to be fine in the partial residuals.
  • Log-income, as noted earlier, is not significant and the uncertainty of the parameter is noted by the confidence interval.

  • We make note, however, that while the confidence intervals for theft are small, the analysis may be strongly affected by the observation of high leverage.

    • Indeed, the coefficient for theft is negative, and this value may be strongly affected by the extreme zip code.
  • This is something we will consider later in the analysis.

Evaluating the uncertainty – continued

  • With the generally good diagnostics, in terms of Gaussianity and constant variance of the errors (with some qualifications) we might make some conclusions.

    • That is, we can put some faith in these p-values and confidence intervals, but this should always be qualified.
  • Loosely speaking, it appears that there is a small, but statistically significant, effect in which neighborhoods with higher concentrations of ethnic minorities have a higher number of FAIR plans, when all other variables are held equal.

  • This effect is not nearly as strong as e.g., the rate of fires with all other variables held equal.

    • However, this indicates that there is a small, but non-random effect where neighborhoods have higher number of FAIR plans based on a variable that shouldn't matter in an insurance application decision.
  • Our analysis is not even nearly complete, and we cannot say if any application for insurance was rejected based upon their ethnic background;

    • nonetheless, this corroborates the claims of the Chicago neighborhood organizations who filed their civil rights lawsuit – this warrants further investigation of the overall effect.

Evaluating the sensitivity of parameters

  • In terms of analyzing the sensitivity of the race parameter with respect to the inclusion or exclusion of other covariates, we can automate this to determine if the effect on the response changes dramatically.

    • This is a larger kind of uncertainty in the model selection itself;
    • if e.g., the parameter changed sign with a different choice of variables, we would have very contradictory explanations of the effect of the concentration of minorities on the number of FAIR plans.
  • This is similar to the case when we used matching in the voting districts in New Hampshire, where we needed to see if an additional covariate was acting as a confounding variable on our analysis.

  • By analyzing the inter-model stability of this parameter, we have a better sense of if this explanation is actually robust.

Evaluating the sensitivity of parameters – continued

  • Suppressing the code to automate this (in Faraway 2nd Edition) we will compare the parameter value and associated p-value for race in each combination of the variables:
                                            beta   pvalue
race                                        0.0139 0.0000
race + fire                                 0.0089 0.0002
race + theft                                0.0141 0.0000
race + age                                  0.0123 0.0000
race + log ( income )                       0.0082 0.0087
race + fire + theft                         0.0082 0.0002
race + fire + age                           0.0089 0.0001
race + fire + log ( income )                0.0070 0.0160
race + theft + age                          0.0128 0.0000
race + theft + log ( income )               0.0084 0.0083
race + age + log ( income )                 0.0099 0.0017
race + fire + theft + age                   0.0081 0.0001
race + fire + theft + log ( income )        0.0073 0.0078
race + fire + age + log ( income )          0.0085 0.0041
race + theft + age + log ( income )         0.0106 0.0010
race + fire + theft + age + log ( income )  0.0095 0.0004
  • The above output shows the parameter for race and the associated p-values for all 16 models.

  • There is some variance in the magnitude of the effect, depending on the other variables we control for, but in no case does the p-value rise above \( 5\% \) or does the parameter change sign.

  • This suggests some uncertainty over the magnitude of the effect, we can be sure that the significance of the effect is not sensitive to the choice of adjusters.

Evaluating the sensitivity of parameters – continued

  • If we suppose that we were able to find models in which the composition of the neighborhood in terms of minorities was not statistically significant, our ability to control for other factors would be more difficult.

  • We would then need to consider more deeply which covariates should be adjusted for and which not.

  • This level of model analysis and selection is at the heart of many real world problems, and we are fortunate as modellers in this case that it is unnecessary.

  • In general, however, this is the reality of real problems you will encounter after this class.

  • Now, we want to analyze the effect of leverage and influence on the model and our interpretation.

  • We recall that there are several influential points, and we wish to determine if their inclusion or exclusion would radically change the interpretation for the parameter for race.

Evaluating the sensitivity of parameters – continued

  • We plot below, using the “dfbeta” function we used in another analysis, the change of the parameter for race when excluding any particular observation:
par(mai=c(1.5,1.5,.5,.5), mgp=c(3,0,0))
plot(dfbeta(lmod)[,2],ylab="Change in race coef", cex=3, cex.lab=3, cex.axis=1.5)
abline(h=0)

plot of chunk unnamed-chunk-9

  • The difference is negligible given the size of the parameter and scale of the response, so we are not especially concerned about the sensitivity of the parameter to observations.

Evaluating the sensitivity of parameters – continued

  • We note that we can also produce a similar plot from the “lm.influence” function, turning the output into a dataframe
diags <- data.frame(lm.influence(lmod)$coef)
  • Using this, we will plot the terms in ggplot2:
ggplot(diags,aes(row.names(diags), race)) + geom_linerange(aes(ymax=0, ymin=race)) + theme(axis.text.x=element_text(angle=90), axis.text=element_text(size=20), axis.title=element_text(size=20,face="bold")) + xlab("ZIP") + geom_hline(yintercept = 0)

plot of chunk unnamed-chunk-11

Evaluating the sensitivity of parameters – continued

  • Likewise, using the diags dataframe, we can produce a two dimensional plot of the differences in each the fire and theft parameters when excluding one of the observations:
ggplot(diags,aes(x=fire,y=theft))+geom_text(label=row.names(diags)) +
  theme(axis.text=element_text(size=20), axis.title=element_text(size=20,face="bold"))

plot of chunk unnamed-chunk-12

Evaluating the sensitivity of parameters – continued

  • As in the earlier diagnostics, we see that the zip codes 60607 and 60610 stick out for high leverage and influence on the parameter values:
chredlin[c("60607","60610"),]
      race fire theft  age involact income side
60607 50.2 39.7   147 83.0      0.9  7.459    n
60610 54.0 34.1    68 52.6      0.3  8.231    n
  • These are both unusually high fire and high theft zip codes – we will remove these two observations simultaneously and see the effect on the paramters:
lmode <- lm(involact ~ race + fire + theft + age + log(income), chredlin,subset=-c(6,24))
sumary(lmode)
              Estimate Std. Error t value  Pr(>|t|)
(Intercept) -0.5767365  1.0800462 -0.5340   0.59638
race         0.0070527  0.0026960  2.6160   0.01259
fire         0.0496474  0.0085701  5.7931 1.004e-06
theft       -0.0064336  0.0043489 -1.4794   0.14708
age          0.0051709  0.0028947  1.7863   0.08182
log(income)  0.1157028  0.4011132  0.2885   0.77453

n = 45, p = 6, Residual SE = 0.30320, R-Squared = 0.8
  • Their exclusion actually makes theft and age no longer significant, though race remains significant.

Evaluating the sensitivity of parameters – continued

  • We have verified that our conclusions are also robust to the exclusion of one or perhaps two cases from the data.

    • If our conclusions depended on including or excluding only one or two observations we would need to be particularly sure of these measurements.
    • In some situations, we might wish to drop such influential cases but this would require strong arguments that such points were in some way exceptional.
  • In any case, it would be very important to disclose this choice in the analysis, and any suppression of the information would be extremely dishonest.

Evaluating predictive power

  • Although this is a model designed for explanation purposes, for completeness in the example, we will look a the predictive power of the model:
x <- model.matrix(lmod)
x0 <- apply(x,2,median)
x0 <- data.frame(t(x0))
# note here that R won't accept the variable that we've rescaled until we re-name it back to its original name... 
colnames(x0)[6] <- "income"
predict(lmod,new=x0,interval="prediction")
          fit       lwr     upr
1 0.003348934 -1.398822 1.40552
predict(lmod,new=x0,interval="confidence")
          fit       lwr      upr
1 0.003348934 -1.225333 1.232031
  • We verify, due to the width of the confidence intervals for the mean and new observations on the median, that the model is worthless for predictions…

Qualifying our analysis: a hypothetical example

  • We have shown very robust evidence for the statistically significant effect of the neighborhood's ethnic composition, with all other variables held equal, having a significant effect in determining the number of FAIR plans in the neighborhood.

  • This doesn't mean conclusively, however, that there was systematic discrimination.

  • If we tried another hypothetical model, supressing theft and the two high influence observations:

modalt <- lm(involact ~ race+fire+log(income),chredlin, subset=-c(6,24))
sumary(modalt)
              Estimate Std. Error t value  Pr(>|t|)
(Intercept)  0.7532557  0.8358798  0.9012   0.37277
race         0.0042061  0.0022757  1.8483   0.07178
fire         0.0510220  0.0084504  6.0378 3.822e-07
log(income) -0.3623825  0.3191620 -1.1354   0.26279

n = 45, p = 4, Residual SE = 0.30919, R-Squared = 0.79
  • the parameter for race is no longer significant.

  • This is just a hypothetical example, because our relatively exhaustive analysis has shown that this is a cherry-picked model, versus the other possible models.

Qualifying our analysis – continued

  • Generally, however, this shows some of the subtlety in statistical analysis like this

    • particularly, independent analyses can lead to a different sequence of models, remediation and refinement, possibly to different conclusions.
  • Part of robust analysis is thus to perform several model selections and the associated diagnostics to deal with the overall uncertainty in the model selection;

    • reasonably we should go back and perform the same steps without log income, for example.
  • The strength of our conclusions will increase if we can show that several different models all have similar interpretations.

  • If there are several reasonable models with different interpretations or conclusions, there is a fundamental issue in the data, and we should be cautious and be skeptical if there is really a signal to be found in the data.

    • A modeler who lacks scientific integrity can similarly explore a large number of models but report only the one that favors a particular conclusion;
    • this is extremely dishonest, and if discovered, often leads to the author being professionally discredited (their work not taken seriously and themselves unemployable).
    • This is likewiswe the case for industrial applications;
    • even if the work is not published, if the analyst consistently makes flimsy models that aren't demonstrably robust, they will be uemployable because they cannot help business operations in any meaninful way.

Qualifying our analysis – continued

  • Generally, we have performed rigorous analysis in the above (pending completion of additional analysis).

  • However, there are still several ways we should be cautious about our conclusions:

    1. Firstly is that this is based on aggregate data – we have assumed that the probability a minority homeowner would obtain a FAIR plan after adjusting for the effect of the other covariates is constant across zip codes.

      If this probability varies in a systematic way (as opposed to random noise about the signal) then our conclusions may be well off the mark.

    2. We have demonstrated statistical significance, but the practical effect is actually fairly small. The largest value of the response (percent FAIR plans) is only \( 2.2\% \) and most other values are much smaller.

      The predicted difference between \( 0\% \) minority and \( 100\% \) minority is about \( 1\% \) , so while we may be confident that some people were affected by discrimination, there may not be so many of them to call this systematic discrimination.

    3. There may be some other latent, confounding variable that we haven’t accounted for that could explain this small difference if included in the model.

    4. We may find other ways to aggregate the data, for which our conclusions will change.

Qualifying our analysis – continued

  • For example, if we fit two separate models for the North Side and South Side of Chicago, we can get very different results:
lmod <- lm(involact ~ race+fire+theft+age+log(income), subset=(side == "s"),chredlin)
sumary(lmod)
              Estimate Std. Error t value Pr(>|t|)
(Intercept) -1.9946613  1.6949252 -1.1768 0.256470
race         0.0093509  0.0046055  2.0304 0.059285
fire         0.0510812  0.0170314  2.9992 0.008493
theft       -0.0102565  0.0090972 -1.1274 0.276184
age          0.0067778  0.0053061  1.2774 0.219700
log(income)  0.6810543  0.6493336  1.0489 0.309832

n = 22, p = 6, Residual SE = 0.34981, R-Squared = 0.76
  • In the South Side, the parameter for race is no longer significant…

Qualifying our analysis – continued

  • while in the North Side, it is:
lmod <- lm(involact ~ race+fire+theft+age+log(income), subset=(side == "n"),chredlin)
sumary(lmod)
              Estimate Std. Error t value Pr(>|t|)
(Intercept) -0.7902939  1.7793768 -0.4441  0.66196
race         0.0133200  0.0053903  2.4711  0.02310
fire         0.0246654  0.0154219  1.5994  0.12623
theft       -0.0079383  0.0039815 -1.9938  0.06073
age          0.0090040  0.0046471  1.9375  0.06769
log(income)  0.1666853  0.6233641  0.2674  0.79204

n = 25, p = 6, Residual SE = 0.35115, R-Squared = 0.76
  • Generally, sub-dividing data into smaller and smaller groups, we can dillute the significance.

  • However, as noted before, aggregating data too much causes its own issues – the balance between these is contextual and subjective based on our understanding of the task at hand.

Qualifying our analysis – continued

  • In this context, we might explain the difference in terms of the demographics of the North Side and the South Side:
ggplot(chredlin,aes(side,race)) + geom_point(position = position_jitter(width = .2,height=0)) + theme(axis.text=element_text(size=20), axis.title=element_text(size=20,face="bold"))

plot of chunk unnamed-chunk-20

summary(chredlin$race[chredlin$side=="n"])
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
   1.00    1.80   10.00   21.95   24.50   94.40 
summary(chredlin$race[chredlin$side=="s"])
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
   1.00   34.27   48.10   49.80   72.92   99.70 
  • With the South Side having a much higher concentration of minorities, the explanatory power of this variable has less effect in the south side, in the presence of other latent variables.

Qualifying our analysis – continued

Population map of race and ethnicity in Chicago 2010 Courtesy of Eric Fischer CC BY-SA 2.0
  • A visualization of the demographics of Chicago is to the left, where the North Side/ South Side divide is clearly expressed.
  • This is a visualization of the 2010 census, where each dot represents 25 residents.
  • The color coding is as follows:
    1. Red: non-Hispanic Caucasian
    2. Blue: African American
    3. Orange: Hispanic
    4. Green: Asian
    5. Yellow: other

Summary of the complete example

  • We performed initial, exploratory analysis, checking for structure in the data, and in the plots of variables versus the response.

  • We fit a model based on a mix of criterion methods and some qualitative judgement.

  • We performed diagnostics to see how our assumptions might be breaking down – aside from an unavoidable issue with the many zero observations for the FAIR plan, it was a well functioning model.

  • We evaluated the partial residuals, and confidence intervals for each of the parameters in our model.

    • This gave us some indication of the uncertainty of this parameter and the structure of the model with respect to it.
  • As this is an explanatory model, we were interested in “predicting” the parameter for “race”;

    • we wanted to understand the sensitivity of this parameter, so we analyzed the p-value and the parameter value for qualitative differences versus all possible covariates in the data.
    • This illustrated a deeper level of the model uncertainty, though one which we seemed to be relatively robust to.
  • We also demonstrated robustness to the exclusion of observations, and checked formally for outliers that could affect the fit.

  • Just for fun, we saw that it has awful predictive power…

  • We made a final assessment of the uncertainty that we could not account for with our data.

Summary of the complete example – continued

  • After all this analysis, everything we say needs to be hedged with “ifs” and “buts” and qualified carefully.

  • This is the reality of statistical modeling, where causual conclusions are extremely difficult to draw.

  • To quote Farway, quoting Winston Churchill:

    “Indeed, it has been said that Democracy is the worst form of government, except all those other forms that have been tried from time to time.”

  • We might say the same about statistics with respect to how it helps us reason in the face of uncertainty.

    • It is not entirely satisfying but the alternatives are worse.